Recent work has reported that AI classifiers trained on audio recordings can accurately predict severe acute respiratory syndrome coronavirus 2 (SARSCoV2) infection status. Here, we undertake a large scale study of audio-based deep learning classifiers, as part of the UK governments pandemic response. We collect and analyse a dataset of audio recordings from 67,842 individuals with linked metadata, including reverse transcription polymerase chain reaction (PCR) test outcomes, of whom 23,514 tested positive for SARS CoV 2. Subjects were recruited via the UK governments National Health Service Test-and-Trace programme and the REal-time Assessment of Community Transmission (REACT) randomised surveillance survey. In an unadjusted analysis of our dataset AI classifiers predict SARS-CoV-2 infection status with high accuracy (Receiver Operating Characteristic Area Under the Curve (ROCAUC) 0.846 [0.838, 0.854]) consistent with the findings of previous studies. However, after matching on measured confounders, such as age, gender, and self reported symptoms, our classifiers performance is much weaker (ROC-AUC 0.619 [0.594, 0.644]). Upon quantifying the utility of audio based classifiers in practical settings, we find them to be outperformed by simple predictive scores based on user reported symptoms.
translated by 谷歌翻译
The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
在这项工作中,我们可以比较应用于运输需求建模问题的机器学习算法的性能。航空运输业预测是由于行业的经济和财务方面的规划和管理的重要组成部分。国际民用航空组织规定的航空公司业务中使用的传统方法是利用多元线性回归(MLR)模型,利用成本变量和经济因素。这里,利用人工神经网络(ANN),自适应神经模糊推理系统(ANFIS),遗传算法,支持向量机和回归树进行模型的性能被MLR进行比较。ANN和ANFIS在最低平均平方误差方面具有最佳性能。
translated by 谷歌翻译
Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
translated by 谷歌翻译
Neural Representations have recently been shown to effectively reconstruct a wide range of signals from 3D meshes and shapes to images and videos. We show that, when adapted correctly, neural representations can be used to directly represent the weights of a pre-trained convolutional neural network, resulting in a Neural Representation for Neural Networks (NeRN). Inspired by coordinate inputs of previous neural representation methods, we assign a coordinate to each convolutional kernel in our network based on its position in the architecture, and optimize a predictor network to map coordinates to their corresponding weights. Similarly to the spatial smoothness of visual scenes, we show that incorporating a smoothness constraint over the original network's weights aids NeRN towards a better reconstruction. In addition, since slight perturbations in pre-trained model weights can result in a considerable accuracy loss, we employ techniques from the field of knowledge distillation to stabilize the learning process. We demonstrate the effectiveness of NeRN in reconstructing widely used architectures on CIFAR-10, CIFAR-100, and ImageNet. Finally, we present two applications using NeRN, demonstrating the capabilities of the learned representations.
translated by 谷歌翻译
Early detection of relevant locations in a piece of news is especially important in extreme events such as environmental disasters, war conflicts, disease outbreaks, or political turmoils. Additionally, this detection also helps recommender systems to promote relevant news based on user locations. Note that, when the relevant locations are not mentioned explicitly in the text, state-of-the-art methods typically fail to recognize them because these methods rely on syntactic recognition. In contrast, by incorporating a knowledge base and connecting entities with their locations, our system successfully infers the relevant locations even when they are not mentioned explicitly in the text. To evaluate the effectiveness of our approach, and due to the lack of datasets in this area, we also contribute to the research community with a gold-standard multilingual news-location dataset, NewsLOC. It contains the annotation of the relevant locations (and their WikiData IDs) of 600+ Wikinews articles in five different languages: English, French, German, Italian, and Spanish. Through experimental evaluations, we show that our proposed system outperforms the baselines and the fine-tuned version of the model using semi-supervised data that increases the classification rate. The source code and the NewsLOC dataset are publicly available for being used by the research community at https://github.com/vsuarezpaniagua/NewsLocation.
translated by 谷歌翻译
In recent years multi-label, multi-class video action recognition has gained significant popularity. While reasoning over temporally connected atomic actions is mundane for intelligent species, standard artificial neural networks (ANN) still struggle to classify them. In the real world, atomic actions often temporally connect to form more complex composite actions. The challenge lies in recognising composite action of varying durations while other distinct composite or atomic actions occur in the background. Drawing upon the success of relational networks, we propose methods that learn to reason over the semantic concept of objects and actions. We empirically show how ANNs benefit from pretraining, relational inductive biases and unordered set-based latent representations. In this paper we propose deep set conditioned I3D (SCI3D), a two stream relational network that employs latent representation of state and visual representation for reasoning over events and actions. They learn to reason about temporally connected actions in order to identify all of them in the video. The proposed method achieves an improvement of around 1.49% mAP in atomic action recognition and 17.57% mAP in composite action recognition, over a I3D-NL baseline, on the CATER dataset.
translated by 谷歌翻译
Large language models (LLMs) have demonstrated excellent zero-shot generalization to new language tasks. However, effective utilization of LLMs for zero-shot visual question-answering (VQA) remains challenging, primarily due to the modality disconnection and task disconnection between LLM and VQA task. End-to-end training on vision and language data may bridge the disconnections, but is inflexible and computationally expensive. To address this issue, we propose \emph{Img2Prompt}, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training. In order to provide such prompts, we further employ LLM-agnostic models to provide prompts that can describe image content and self-constructed question-answer pairs, which can effectively guide LLM to perform zero-shot VQA tasks. Img2Prompt offers the following benefits: 1) It can flexibly work with various LLMs to perform VQA. 2)~Without the needing of end-to-end training, it significantly reduces the cost of deploying LLM for zero-shot VQA tasks. 3) It achieves comparable or better performance than methods relying on end-to-end training. For example, we outperform Flamingo~\cite{Deepmind:Flamingo2022} by 5.6\% on VQAv2. On the challenging A-OKVQA dataset, our method even outperforms few-shot methods by as much as 20\%.
translated by 谷歌翻译
Recent methods demonstrate that data augmentation using counterfactual knowledge can teach models the causal structure of a task, leading to robust and generalizable models. However, such counterfactual data often has a limited scale and diversity if crowdsourced and is computationally expensive to extend to new perturbation types if generated using supervised methods. To address this, we introduce a new framework called DISCO for automatically generating high-quality counterfactual data at scale. DISCO engineers prompts to generate phrasal perturbations with a large general language model. Then, a task-specific teacher model filters the generation to distill high-quality counterfactual data. We show that learning with this counterfactual data yields a comparatively small student model that is 6% (absolute) more robust and generalizes 5% better across distributions than baselines on various challenging evaluations. This model is also 15% more sensitive in differentiating original and counterfactual examples, on three evaluation sets written by human workers and via human-AI collaboration.
translated by 谷歌翻译